add qwen3 moe #631

HaloKim · 2025-09-25T05:46:50Z

Add support for Qwen3 MoE conversion.

Modified files:

mergekit/architecture/moe_defs.py
mergekit/moe/init.py
mergekit/moe/qwen.py
mergekit/moe/qwen3.py

github-actions · 2025-09-25T05:47:02Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

1 out of 2 committers have signed the CLA.
✅ (HaloKim)[https://github.com/HaloKim]
❌ @dev7halo
dev7halo seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

HaloKim · 2025-09-25T05:57:34Z

I have read the CLA Document and I hereby sign the CLA

HaloKim · 2025-09-25T06:34:40Z

recheck

graphite-app · 2025-10-14T23:57:31Z

mergekit/architecture/moe_defs.py

+
+        # Expert weights 추가
+        for expert_idx in range(num_experts):


The parameter order in the expert loops is inconsistent between implementations. In Qwen3MoeModuleArchitecture, the order is up_proj, gate_proj, down_proj, while in KORMoMoeModuleArchitecture it's gate_proj, up_proj, down_proj. This inconsistency should be standardized to ensure correct weight processing across different model architectures. Consider aligning the parameter order in both implementations to maintain consistency throughout the codebase.

Spotted by Graphite Agent

Is this helpful? React 👍 or 👎 to let us know.

cursor · 2025-10-16T01:52:43Z

mergekit/moe/kormo.py

+            tqdm.tqdm(router_weights, desc="Router weights")
+        ):
+            writer.save_tensor(
+                f"model.layers.{layer_idx}.mlp.gate.linear.weight",


Bug: Router Gate Weight Naming Mismatch

The KORMoMoeModuleArchitecture expects router gate weights to be named mlp.gate.weight, but the KORMoMoE saving logic adds a .linear suffix, resulting in mlp.gate.linear.weight. This naming inconsistency prevents the model from loading correctly.

Additional Locations (1)

mergekit/architecture/moe_defs.py#L166-L167

graphite-app · 2025-10-16T01:53:51Z

mergekit/moe/kormo.py

+            tqdm.tqdm(router_weights, desc="Router weights")
+        ):
+            writer.save_tensor(
+                f"model.layers.{layer_idx}.mlp.gate.linear.weight",


The tensor path in kormo.py should be model.layers.{layer_idx}.mlp.gate.weight rather than model.layers.{layer_idx}.mlp.gate.linear.weight to maintain consistency with the MoEGate implementation in the modeling file. The current path would cause the router weights to be saved at an incorrect location, making them inaccessible to the model during loading.

Suggested change

f"model.layers.{layer_idx}.mlp.gate.linear.weight",

f"model.layers.{layer_idx}.mlp.gate.weight",

Spotted by Graphite Agent

Is this helpful? React 👍 or 👎 to let us know.

add qwen3 moe

388680b

This comment was marked as outdated.

Sign in to view

trigger CLA check

6235c67

This comment was marked as outdated.

Sign in to view

add kormo-moe

d0d32ab

graphite-app bot reviewed Oct 14, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

update kormo-moe

e057446

cursor bot reviewed Oct 16, 2025

View reviewed changes

graphite-app bot reviewed Oct 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add qwen3 moe #631

add qwen3 moe #631

Uh oh!

HaloKim commented Sep 25, 2025

Uh oh!

github-actions bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

HaloKim commented Sep 25, 2025

Uh oh!

HaloKim commented Sep 25, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

graphite-app bot Oct 14, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

cursor bot Oct 16, 2025

Uh oh!

graphite-app bot Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		# Expert weights 추가
		for expert_idx in range(num_experts):

	f"model.layers.{layer_idx}.mlp.gate.linear.weight",
	f"model.layers.{layer_idx}.mlp.gate.weight",

add qwen3 moe #631

Are you sure you want to change the base?

add qwen3 moe #631

Uh oh!

Conversation

HaloKim commented Sep 25, 2025

Uh oh!

github-actions bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

HaloKim commented Sep 25, 2025

Uh oh!

HaloKim commented Sep 25, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

graphite-app bot Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

cursor bot Oct 16, 2025

Choose a reason for hiding this comment

Bug: Router Gate Weight Naming Mismatch

Uh oh!

graphite-app bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Sep 25, 2025 •

edited

Loading